# Load package(s)
library(tidyverse)
# Load data
cdc <- read_delim('data/cdc.txt', delim = '|') |>
mutate(
genhlth = factor(
genhlth,
levels = c('excellent', 'very good', 'good', 'fair', 'poor')
)
)L03 ggplot 2
Data Visualization (STAT 302)
Overview
The goal of this lab is to continue the process of unlocking the power of ggplot2 through constructing and experimenting with a few basic plots.
Datasets
We will be using the BRFSS survey which was introduced in the last lab. The data was supplied in cdc.txt file and should be in your /data subdirectory. If not, you need to download the last lab to get this data file. As a reminder, the dataset contains 20,000 complete observations/records of 9 variables/fields, described below.
genhlth- How would you rate your general health? (excellent, very good, good, fair, poor)exerany- Have you exercised in the past month? (1 = yes,0 = no)hlthplan- Do you have some form of health coverage? (1 = yes,0 = no)smoke100- Have you smoked at least 100 cigarettes in your life time? (1 = yes,0 = no)height- height in inchesweight- weight in poundswtdesire- weight desired in poundsage- in yearsgender-mfor males andffor females
Exercise 1
Using the cdc dataset, we want to look at the relationship between height and weight. Recreate the following graphics as precisely as possible.
Plot 1
Hints:
- Transparency is 0.2
- Minimal theme
Plot 2
Hints:
- linewidth = 0.7
Plot 3
Hints:
binsset to 35
Plot 4
Hints:
- use a stat layer, not a geom layer
geom = "polygon"
Exercise 2
Using the cdc_means dataset derived from the cdc dataset, recreate the following graphic as precisely as possible.
Hints:
- Hex color code
#56B4E9 - 95% confidence intervals (1.96 or
qnorm(0.975)) - Some useful values: 0.1, 0.7
# data wrangling
# calc mean and se for CI
cdc_means <- cdc |>
mutate(wtloss = weight - wtdesire) |>
group_by(genhlth) |>
summarize(
mean = mean(wtloss),
se = sd(wtloss) / sqrt(n())
) |>
mutate(genhlth = fct_reorder(factor(genhlth), desc(mean)))Exercise 3
Using the cdc_weight_95ci dataset derived from the cdc dataset, recreate the following graphic as precisely as possible.
Hints:
- Useful values: 0.1, 0.5
- Need to know CI formula
# data wrangling
# calculate mean, se, and margin of error for CI formula
cdc_weight_95ci <- cdc |>
group_by(genhlth, gender) |>
summarise(
mean_wt = mean(weight),
se = sd(weight) / sqrt(n()),
moe = qt(0.975, n() - 1) * se
) |>
ungroup()